
APPLICATION FOR UNITED STATES PATENT 
Inventor: Marcia K. Wolf 

This application is a continuation of patent application USSN 
08/788,145 08/788,145 filed January 24, 1997, which is a continua- 
tion-in-part of patent application USSN 243,482 filed May 13, 1994, 
which is now abandoned. 
Field of the Invention: 

This invention is related to a CS6 antigen for use in vaccines 
to protect from pathological effects of enterotoxigenic E. coli . 
Background of the Inventions t 

CS6 is a component of CFA/IV (colonization factor antigen IV), 
one of three CFAs commonly found on enterotoxigenic Escherichia 
coli (ETEC) . A recent study showed CS6 on 31% of ETEC isolated 
from soldiers in the Middle East. Other CFAs and similar proteins 
found on the surface of ETEC function as adhesins to attach 
bacteria to intestinal epithelial cells. Attached bacteria can 
then deliver their toxin (s) to the target cells. It has never been 
proved that CSS is an adhesin for human tissue (Knutton, S., M. M. 
McConnell, B. Rowe, and A. S. McNeish, "Adhesion and Ultrastructur- 
al Properties of Human Enterotoxigenic Escherichia coli Producing 
Colonization Factor Antigens III and IV", Infect . Immun . 57:3364— 
3371 (1989)), but a study in rabbits indicated CS6 is a coloniza- 
tion factor. 

The CS6 operon has much in common with fimbrial operons from 
E. coli. Salmonella, Yersinia, Klebsiella, Haemophilus, and 
Bordetella. All contain molecular chaperons and ushers and a 
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number of structural subunits. This area contains two sequences 
homologous to insertion sequences, but no complete insertion 
sequences . 

The low GC content (34%) and codon usage that is characteris- 
tic of E, coli genes that are expressed at low levels suggest the 
CS6 genes may have originated in another species • GC content of 
35-45% is characteristic of Gram positive bacteria such as 
Staphylococcus, Streptococcus, Bacillus, and Lactobacillus, Low GC 
content is common for virulence-associated genes of E, coli. 

CS6 is unusual because it is expressed on bacteria grown on a 
variety of media , unl ike other CFA ' s from ETEC that are only 
expressed on bacteria grown on CFA agar. This unusual regulation 
is not peculiar to strain E8775 because ETEC isolated in 1990 
expressed CS6 when grown on L agar. Temperature regulation of CS6 
expression is characteristic of other CPA's from ETEC and virulence 
genes in a variety of pathogenic bacteria. 

Although CS6 has never been visualized by negative staining, 
electron microscopy using anti-CS6 sera and colloidal gold 
indicated that it is present on the surface of ETEC. The apparent 
major protein associated with CS6 is approximately 16 kDa which is 
in the range of molecular weights typical for subunits for fimbriae 
and fibrillae. CS6 from ETEC strain E10703 of serotype 0167 :H5 has 
been cloned (Willshaw, et al., FEMS Microbio, Let . 49: 473-478 
(1988)). Only 3 kb of DNA was necessary for expression of CS6. 
That is in contrast to fimbriae that typically require approximate- 
ly 9 kb of DNA and include genes for subunits as well as proteins 



for transport of subunits and synthesis and assembly on the 
bacterial surface. 

Grewal teaches bacterial strains transformed with plasmids 
containing genes which encode CS6. However, that reference does 
not teach use of plasmids under the controls of a lac promoter and 
a CS6 promoter. 

Brief Description of the Drawings: 

Figure l shows the restriction sites and the location of the 
pertinent genes that make up the CS6 operon. 

Figure 2 shows derivation of the clone containing the 
kanamycin resistance gene. 
Description of the Invention: 

It is the purpose of this invention to provide structural 

proteins which will act as antigens to stimulate Rrotectlye 

enterotoxigenic Escherichia coli. Particularly 
important are proteins having the antigenic properties of the 
proteins encoded by the cssA and cssB genes. Constructs may be 
prepared which encode either one or both of the proteins. However, 
both proteins would be needed to provide desirable protection. The 
CS6 operon includes four genes which we designate css A. cssB, cssC, 
and cssD. cssA and CssB encode the structural proteins of CS6. 
The CS6 operon has much in common with fimbrial operons from E. 
coli, Salmonella, Yersinia, Klebsiella, Haemophilus, and Bordetel- 
la. All contain molecular chaperons and ushers and a number of 
structural subunits. In a preferred embodiment, plasmids containing 
all four genes are transformed into attenuated bacteria, which are 



then given by mouth to prevent morbidity arising from infection 
with E. coli . 

CS6 has two major subunits; protein sequencing data demon- 
strates that CS6A and CS6B are both present. The DNA sequence 
yields a mechanism for expression of similar amounts of the two 
proteins. The CS6 operon contains DNA immediately downstream of 
cssB which can form a stem-loop with a stem rich in G and C which 
commonly act as transcription terminators. Termination at this 
site yields a transcript with cssA and css B such that CssA and CssB 
proteins would be translated in equal amounts. Fimbrial operons 
for Pap, K99, and K88 have stem loops immediately downstream of the 
genes for the major coding structural subunits. This has been 
offered as a mechanism for over expression of subunit genes 
relative to other genes in the operons. In the case of CS6, this 
would allow over expression of both CS6A and CS6B. 

The occurrence of two major structural proteins is unusual 
because fimbriae have a single major subunit and a number of 
minor subunits. CS3, which has been designated fibrillar rather 
than fimbrial, is an exception to this generality because it has 2 
subunits. CssD belongs to the family of molecular ushers located 
in the outer membrane that accept subunits from the chaperone and 
escorts them to the bacterial surface. Apparently the entire css D 
gene is not necessary for CS6 expression since CSS was detected 
from clones carrying pDEP5 which only contains the N-terminal one- 
third of cssD. Klemm and Christiansen found that mutations in the 
usher for Type 1 fimbriae reduced fimbriation but 10% of the 



bacteria produced a few fimbriae ( Mol. Gen. Genet. 220:334-338). 

The CS6 proteins are produced in the transformed bacteria and 
are present on the exterior surface of the bacteria. These 
proteins give rise to immunological response in the host. For 
immunization, the bacteria may be given either dead or alive. When 
attenuated bacteria have been transformed, the bacteria can be 
given live in mildly basic carriers. Economical and readily 
available carriers include carbonated water which may be flavored. 
The administration of the transformed bacteria in carbonated 
beverages is particularly useful, since the means necessary for 
administration is widely available. . 

In a preferred embodiment, the products are produced under the 
control of a lac promoter from pUC19. In the preferred embodiment, 
a vector pM34 6 containing a kanamycin resistant gene makes it 
possible to provide products which are appropriate for use in 
humans . 

The CS6 proteins may also be extracted from the supernatant of 
the culture containing the organisms which express the proteins. 
The proteins may then be administered orally. The proteins may be 
formulated by means known in the art, including microencapsulation, 
coated capsules and liposomes. The proteins may be lyophilized 
before formulation. 
MATERIALS AND METHODS 
Source of nucleic acid 

The genes for CS6 expression were from enterotoxigenic 
Escherichia coli (ETEC) strain E8775 tox' of serotype 025:H42 which 



was a gift from Alejandro Cravioto. E8775 tox" is a derivative of 
E> coli strain E8775 which was originally isolated from Bangladesh. 
DH5a which was purchased from Bethesda Research Laboratories, Inc. , 
Gaithersburg, MD. pUC19 was originally purchased from P-L Biochem. 

The antibiotic resistance gene encodes resistance to kanamycin 
and was purchased from Pharmacia, Uppsala, Sweden (Kan^ GenBlock®) . 

CS6 expression is regulated from its native promoter. That is 
demonstrated by retention of control by growth temperature and is 
consistent with the DNA sequence determined from the clone. A 
contribution of the lac promoter from pUC19 is undefined. The 
contribution of increased copy number of the plasmid is probably 
substantial • 

The nucleotide sequence containing the coding region was 
determined to be as constructed containing the kanamycin resistance 
gene was as follows: 
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AAGCTTGTAA 


CCAGTTGATA 


AAAATATATC 


ACGCTGGGAA 


TGACGTGATG 


51 


TATATACGGA 


GCAGCTATGT 


CGGAACAGAT 


ATTTTCCTAT 


CGGTATGCGT 


101 


TGTGAGTAAG 


CGTAAAGCCA 


ATGCTGTCTG 


TAACTCCTGA 


TCCTTGCAGA 


151 


CTAAATTAGA 


GCTCCTTCTA 


AATTAGACGG 


ATGGATAAAC 


CTACAGACTG 


201 


GCGCTCTGGG 


TCTCGCCGGA 


TATTTTCTAA 


TGAATTTAAG 


CTTCATATGG 


251 


TTGAACTGGC 


TTCGAAACCA 


AATGCCAATG 


TCGCACAACT 


GGCTCGGGAA 


301 


CATGGCGTTG 


ATAACAACCT 


GATTTTTAAA 


TAGCTACGCC 


TCTGGCAAAG 


351 


AGAAGGACGT 


ATTTCTCGTA 


GAATGCCTCC 


AACTATTGTA 


GGCCCTACAG 


401 


TACCACTGAG 


GTAGCCTGAA 


TTTAAAGCCG 


AAGCGGTCAG 


AACTGTTCTT 


451 


GGTGTGAACG 


TAGCACTCAC 


CAATAAAA<3C 


ATCAATACGG 


TGCTCTGTTG 


501 


ACACATTACG 


AATGTTATGT 


ATACAATAAA 


AATGATTATA 


GCAATATTAA 



} 



551 


TGGTGTTATA 


TGAAGAAAAC 


AATTGGTTTA 


ATTCTAATTC 


TTGCTTCATT 


601 


CGGCAGCCAT 


GCCAGAACAG 


AAATAGCGAC 


TAAAAACTTC 


CCAGTATCAA 


651 


CGACTATTTC 


AAAAAGTTTT 


TTTGCACCTG 


AACCACGAAT 


ACAGCCTTCT 


701 


TTTGGTGAAA 


ATGTTGGAAA 


GGAAGGAGCT 


TTATTATTTA 


GTGTGAACTT 


751 


AACTGTTCCT 


GAAAATGTAT 


CCCAGGTAAC 


GGTCTACCCT 


GTTTATGATG 


801 


AAGATTATGG 


GTTAGGACGA 


CTAGTAAATA 


CCGCTGATGC 


TTCCCAATCA 


851 


ATAATCTACC 


AGATTGTTGA 


TGAGAAAGGG 


AAAAAAATGT 


TAAAAGATCA 


901 


TGGTGCAGAG 


GTTACACCTA 


ATCAACAAAT 


AACTTTTAAA 


GCGCTGAATT 


951 


ATACTAGCGG 


GGAAAAAAAA 


ATATCTCCTG 


GAATATATAA 


CGATCAGGTT 


1001 


ATGGTTGGTT 


ACTATGTAAA 


CTAAATACTG 


GAAGTATGAT 


TATGTTGAAA 


1051 


AAAATTATTT 


CGGCTATTGC 


ATTAATTGCA 


GGAACTTCCG 


GAGTGGTAAA 


1101 


TGCAGGAAAC 


TGGCAATATA 


AATCTCTGGA 


TGTAAATGTA 


AATATTGAGC 


1151 


AAAATTTTAT 


TCCAGATATT 


GATTCCGCTG 


TTCGTATAAT 


ACCTGTTAAT 


1201 


TACGATTCGG 


ACCCGAAACT 


GGATTCACAG 


TTATATACGG 


TTGAGATGAC 


1251 


GATCCCTGCA 


GGTGTAAGCG 


CAGTTAAAAT 


CGCACCAACA 


GATAGTCTGA 


1301 


CATCTTCTGG 


ACAGCAGATC 


GGAAAGCTGG 


TTAATGTAAA 


CAATCCAGAT 


1351 


CAAAATATGA 


ATTATTATAT 


CAGAAAGGAT 


TCTGGCGCTG 


GTAACTTTAT 


1401 


GGCAGGACAA 


AAAGGATCCT 


TTCCTGTCAA 


AGAGAATACG 


TCATACACAT 


1451 


TCTCAGCAAT 


TTATACTGGT 


GGCGAATACC 


CTAATAGCGG 


ATATTCGTCT 


1501 


GGTACTTATG 


CAGGAAATTT 


GACTGTATCA 


TTTTACAGCA 


ATTAAAAAAA 


1551 


GGCCGCATTA 


TTGCGGCCAT 


TGACGATACT 


GCTAGGCAAA 


AATATGAAAT 


1601 


CAAAGTTAAT 


TATACTATTG 


ACGTTAGTGC 


CATTTTCATC 


TTTTTCAACA 


1651 


GGAAATAATT 


TTGAAATAAA 


TAAGACACGA 


GTAATTTACT 


CTGACAGCAC 


1701 


ACCATCAGTT 


CAAATATCAA 


ATAATAAAGC 


ATATCCTTTA 


ATTATTCAAA 


1751 


GCAATGTATG 


GGATGAAAGC 


AATAATAAAA 


ATCATGACTT 


TATAGCAACA 
ft 


1801 


CCACCGATTT 


TTAAAATGGA 


AAGTGAAAGT 


CGGAATATAA 


TAAAAATAAT 



1 Q C 1 


TAAAACAACT 


ATTAATTTGC 




GTATTGAATC 


TV TV ^\ ^\ TV V 

AATGCCACCA 


1951 


GAAGGAAGGA 


CAGACAGTAT 


2001 


GATATATCGA 


CCTGCCAGTG 


O A K 1 


IV 1^ TV TV H mm TV TV TV 

AAAAATTAAA 


ATGGCATAAA 


^ X U 1 


ACACCCTATT 


ACATTAGCTT 


. ZxD± 


TV TV TV ^ TV TV m/^ ^ TV 

AAACAATGCA 


AAAGATATTT 


O O ^ T 


m TV TV m TV m^Tv 

TAGATATCAG 


CAACAGAATA 


O O C "1 


G ATG CTGGCG 


CAAAAACAAA 


o o A n 
Z JOl 


Tv nnnn t^ ^» n m •* m 

ATTACAGTAT 


ACAAAAACAT 


O 1 C T 
^ ^Ol 


TV m TV m TV m m m 

ATATATCCTT 


TCTCAACCTC 


^ 4 Ul 


TV TV mm /~» /*^m tv 

ACGATTCCTA 


CCATCAGGTT 


<^4d1 


/^mo TV TV rn/~« tv 

CTGAAGGTGA 


GTATCTGGTT 


2d01 


TCCGCGATTA 


TTCCTTTTTA 


2551 


Tv m TV ^ Tt Tl « 

ATCAAAAGAA 


AAAATTTCAT 


o c n T 
2601 


TV Tv % TV 

ACACAGAGTG 


TGTAGAAACA 




GAGTTTAGCT 


CTCTTCGTTT 


2 /Ol 


nn^^ TV ^1 TV ^nm^^ TV m 

TGAGATTGAT 


AAAATATCAT 


O "7 C T 

2 / 51 


TATTTTTTAA 


TTATCAAGTA 


O Q T 

2 o 0 1 


m TV /Tl TV r 1 11 1 1 TV TV 

TATGATTACA 


TTTCTGTTTC 


O Q K T 


m TV TV m /^mm 

GCGTAATCTT 


TTTGAATTTA 


2901 


GAAACTACAC 


TTATCTAGAA 


2951 


GTCGTTGGTG 


AAAGTTATAC 


3001 


TACTGGTATT 


TCAGTTTCTA 


3051 


TCGATTATAC 


ACCAGAAATT 


3101 


ATTGTCAGGC 


AAGGCAACAC 



! 

CGGACTCTCA GGAAAGTATG AGATGGTTAT 
ATAGAAAAAA GTACTAAAAT AAACAGAAAA 
TAATATCAGC ATTCGGGGGT GCATTAAACT 
TTCCGTCTCC TGTTTTTAAT AATATAGTAG 
AATGGAAAGT ATCTTGTATT AAAAAATAAT 
TTCTGAGGTT Tl-TTTTGATT CAGATAAAGT 
TATATGTAAA ACCATACTCA GAGAAGAAAA 
ATAAAAAAAA TCAAATGGGC TATGATTGAT 
ACTTTATGAA TCAATTTTAT AAAAAATCTC 
CAGATTACAG GCTTGCTTTT TTTGCTATTT 
ATATGGAAAT GAACAATTTA GTTTTGACTC 
ATAATTACTC TTTAAATAGT AACTTACCTC 
GATATTTATA TTAACAAAAT AAAAAAGGAG 
TATAAAAGGA AATAAACTTG TACCATGTTT 
CTTTGGGTAT CAACATTAAT AATAACGACA 
AGTAAGGCAG GTATTAGTAA TATCAGCTTT 
GTTTATTGCT GTACCGAAAA ATCTTCTGTC 
CAAAGGATAT AGATAACGGG ATTCATGCTT 
AATACAAGGC TAGCCAATAA TAAAAATCGT 
ACCAAATATA AATTATTTTT CATGGCGGTT 
ACCAAAACAA CGATGAAAAA ACATGGGAAA 
AAAAGTTTTT ATGATAAAAA GCTAAACTTA 
GAATTCAAAT GTTTATAATA ACTACTCTTT 
CAGATACAGA TATGTATACG CCAAGTGAAA 
CATGGAGTGG CTGATTCAGA CTCTCAGATT 
CATTATCATT AATGAAAGTG TTCCAGCCGG 
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n^V^o X X w X V^ri 


X X XU^^AAXAA 


Q^C AA X CTCAT 


GTATACTGGG 


GGGCAACTTA 


^ W X 






xA X GGAAATA 


Tl Tk Tt ^ ^ ^ m 

AAAAACAATA 


TACTGTCAAT 


^ ^ *j d. 


rxriX X v.. v.* X ^ X V- 


X X s^UXlaX X AX 


G AG AAAAG CG 


GGACTAATGG 


TATATAATTT 


■mJ ij \J U. 


X n X A X X wUVJ 


AAA T"Ti A A P A A 
AAAX XAAV-AA 


AAAAAAAmAO 

AAAAAAATAG 


TGAGGATGGT 


GATTTTTTTA 


3351 


PTP A A rii^'T'f^ A 


X AX X AAUXAV- 


GG i AC rCACT 


TV cn Tl T^ ^ ^ 

ATAACAGCAC 


ACTATTCGGT 


3401 


nt^ATATPAr^T 


T"T' Al^T* A A A A A 
X XAoXAAAAA 


X 1 A i X 1 i AAC 


TTATCTACTG 


GTATAGGCAC 


3451 




X X X X X Av7 


p A rnp p /^m a /^rn 
L. A X Go V- X AL. X 


A TV /~y mm n ^ 

ACACGTTAGC 


AGAAGTAATT 


3501 


TTAAGAATAA 


A A A Trie A "PAT 
AAA X A X A X 


A A A TiTi A A m O 

AAXAi X AATC 


nri TV TV TV ^> TV Tk Ik 

TACAACAaAA 


CACTCAGTTA 


3551 
•J ^ *j ^ 


AGAPPATTPA 


ATr2PPPP^PT 
AXol^UooooX 


X AA XXX CG AT 


m TV ^^^^ en ^ ^ 

TACGCATACA 


GAAAAAAAAG 


3601 
^ \j \j M 


(^T A Tf^T^^ A A 
w X ri X vj X vjVjriri 


L-X X XUv^oAUA 


X XGGCTGGCA 


TGGTAATTTA 


TATAATCAAC 


3651 

•>J V .L 


TTA A A A ATA/7 


mrnrnrnrh/^rnrnrn a 
XXXXXL.XXXA 


X CCTTGTCAA 


AATCATTGAA 


TAAATACGGA 


3701 


AATTTPTPAP 

nxl X X X V.^ X ^/IVw 


TT*7 A TT A T A A 
X X o A X X A X AA 


A A"A TV m/~« TV TV Tv 

CAAAATGAAA 


TACTGGGATA 


ATGCGTATGA 


3751 


TACPA APTipA 


A X X UtjA X X 


G X X A X TTTTT 


TAAATTCATG 


CGAGCAATGA 


3801 
-J \j \j 


TTAPA APA A A 


XXVjrXXUXX X A 


A A rn A A TV m tv 

AA X AAATATC 


AATCTTATGA 


AAAAAAAGAT 


3851 
*j \j ^ ^ 


AA AArtATTTA 


^TirpATT" A AT* AT" 
oX AX X AAX AX 


A X CA xTGCCT 


TTAACCAAAG 


ATTACGGGCA 


3901 

^ \J ^ 


PATATPTT^PA 

\mfr\ X A X w X X \^rX 


A A A nm^/^ A T> 
AAUXAX X^AX 


X X X CCAATGC 


AAATACAGGA 


ACGGCAACCA 


3951 

•mf ^ ■mJ J. 


X X X o X 


UX XAAAv^CjCjI 


AGTTTTTTTA 


ATGACGCAAG 


ATTAAACTGG 


4001 

t \J \J JL 


AAPATTPA<^P 


apaapapaa^^ 
AvjAAUAVjjAAC 


GACCCGTAAC 


AATGGATATA 


CTGATAATAC 


4051 
T v ^ 


CAGTTAPATA 


(7P A APP APOT 
vj^AA^L'AoL* X 


Ax GCCTCTCC 


CTATGGCGTT 


TTTACTGGTT 


41 m 


P AT* AT"PP A PP 


ATO/^ A A A A 


m TV ^n^n ^% tv tv ^* ^> 

TATTCAAGCC 


AGl'TTTATTC 


TGCATCGGGA 


A 1 


ooXAX iKsX X i 


1 GCATAGCGA 


TGGCGTAGCT 


TTTACTCAAA 


AAGCCGGAGA 




X/IU^X X X 


C n GTCCGTA 


TTGATAATAT 


TTCTGATATA 


AAAATTGGTA 


4 9 Rl 

'x ^ ^ X 


A ^ A r^'vr^r^'vr^r* 

AL.'/iL. X LrU- X otj 


lull 1 ATACT 


GGGTATAATG 


GTTTTGCTTT 


AATTCCTCAT 


4301 


CTTCAGCCGT 


TCAAAAAAAA 


CACCATTTTA 


ATTAATGATA 


AAGGAATTCC 


4351 


AGACGGTATT 


ACTCTTGCTA 


ATATAAAAAA 


ACAAGTTATC 


CCATCACGAG 


4401 


GAGCTATTGT 


TAAAGTAAAA 


TTTGATGCTA 


AAAAAGGCAA 


TGACATTTTG 



4451 TTTAAGCTTA CAACTAAAGA TGGAAAAACG CCCCCATTAG GAGCTATAGC 

4501 CCATGAAAAA AATGGAAAAC AGATTAATAC GGGTATCGTT GACGATGATG 

4551 GTATGCTTTA TATGTCTGGA TTATCAGGGA CAGGGATTAT TAATGTAACA 

4601 TGGAATGGAA AAGTCTGTTC ATTTCCTTTT TCAGAAAAAG ATATATCTAG 

4651 CAAACAATTA TCTGTTGTAA ATAAACAATG TTAGGTAGTG CATCCAATTA 

4701 GTAGAACATG TGTTTTTCGA TAAACGCTCC GATCTCTTTT TCGTGGATCT 

4751 CAACTGAGCG TGAGAAGCAG ATTGTTTTAC GAGCCAACCG CTTAATGCGG 

4801 GTGCGTAGCG TCAGATTATT ACGCTCAATG CGTTGGGTGA ATATTTTGCC 

4851 GGTCAGATGC TTATTCTTCG GTACC Sequence ID No. 1 

B. Cell expression clone : 

<^p3.i HBlOl was purchased from the American Type Culture 
Collection, Rockville, Maryland. It is ATCC #33694 and batch #91- 
1. (Escherichia coli ATCC 33694) 

Preceptrol [Reg TM] culture. D. Ish-Horowicz and J.F. Burke HBlOl 

< H. Boyer. Genotype: F- leuB6 proA2 recA13 thi-l ara-l4 

lacYlgalK2 xyl-5 mtl-l rpsL20 supE44 hsdS20 (r- B m- B at least 
thi-hsd from Escherichia coli B) . Produces isoprene (Curr. 
Microbiol. 30:97-103, 1995). J. Mol. Biol. 41: 459-472, 1969; 
Methods Enzymol. 68: 245-267, 1979.) Growth Conditions: Medium 
1065 37C. 

The plasmid containing the CS6 genes, the pUC19 origin of 
replication, and the gene for kanamycin resistance was transferred 
into E. coli HBlOl by transformation. Transf ormants were selected 
by growth on L agar supplemented with 0.04% Xgal with 50 /xgm per ml 
kanamycin sulfate and/or 50 (jLgvx per ml ampicillin. 

One copy of the CS6 genes exists as an extrachromosomal 
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plasmid of high (500-700) copy number. The CS6 genes are present 
on a plasmid, not integrated into the chromosome. The plasmid has 
been isolated from the strain and examined by agarose gel electro- 
phoresis. 

Plasmid DNA from E8775 tox^ was transferred to laboratory 
strain DH5a as a cointegrate with F* lac^^: :Tn5, a conjugative 
plasmid. Transfer of the F' lac^^: :Tn5 plasmid was selected by 
antibiotic resistance to kanamycin and CS6 expression was detected 
by Western blot using polyclonal antisera specific for CS6. 
Plasmids were isolated and a cointegrate was identified based on 
the large size. A spontaneous derivative in which the F' lac^^: :Tn5 
was removed was obtained and named M56. M56 contains a 61- 
megadalton plasmid from E8775 tox' and expresses CS6. Plasmid DNA 
from M56 was isolated, partially digested with restriction enzyme 
Hindlll, and ligated to pUC19 that had been digested with Hindlll. 
The ligation mixture was transformed into DH5a and plated onto L 
agar supplemented with ampicillin and X-gal. white (lac') colonies 
were picked to CFA plates supplemented with ampicillin and tested 
for CS6 expression. 

A stable clone named M233 with an insert of approximately 24 
kb into the cloning site of pUC19 was obtained. It was a spontane- 
ous deletion of a larger clone. Subclones were obtained by 
digestion with various enzymes and a subclone containing approxi- 
mately 5 kb from the Hindlll site to Kpnl site was found that 
expressed CS6. This clone was designated M285. Expression of CS6 
was verified by transferring plasmids into E. coli strain HBlOl and 
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detecting CS6 expression. The cloned CS6 is expressed under the 
same conditions as CS6 from the native 61-megadalton plasmid: CS6 
was detected in extracts from bacteria grown at 37 °c on CFA agar, 
L agar or MacConkey agar. CS6 was not expressed on bacteria grown 
at 17*c. 

Studies were performed to determine appropriate handling of 
strain M285 for reproducible expression of CS6. Growth temperature 
was found to be especially important. 

As indicated above, the protein sequence of the N-terminus of 
CS6 was determined from strains E8775 and from M233, the large 
clone derived from E8775. The 16 kDa proteins recovered from heat, 
saline extracts, and ammonium sulfate precipitation of M233 yielded 
two amino acids at each position (except cycle 12) indicating that 
two proteins were present. From the strength of the two signals, 
a probable primary sequence and a probable secondary sequence call 
was made for each of fifteen cycles. Quantitative analysis of the 
peak areas indicated that the molar ratio of the primary sequence 
(CS6A) to secondary sequence (CS6B) was approximately 3:1. The 
presence of the same two proteins was evident from strain E8775 
grown on CFA agar and on L agar. 

The DNA sequence of the DNA inserted into pUC19 in clone M285 
was determined. Wim Gaastra' s group in the Netherlands indepen- 
dently determined the DNA sequence of CS6 genes from ETEC strain 
E10703. The DNA sequences are available from Genebank accession 
numbers U04846 and U04844, respectively. A stretch of DNA of 4,219 

« 

base pairs was 98% identical. The DNA sequences diverge abruptly 
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on both sides of the common region, defining the limits of the CS6 
genes. Four open reading frames were detected within the common 
area. These were designated cssA, cssB, cssC, and cssD. 

The four open reading frames are preceded by consensus 
sequences for binding RNA polymerase and ribosomes. The first open 
reading frame, cssA was identified as the gene for the CS6 
structural protein designated as the primary protein based on the 
amino acid N-terminal sequence. The deduced molecular weight 
agrees with that previously determined from SDS PAGE. cssA 
includes a signal sequence that is typical for many exported 
proteins. Eleven of 136 residues differ between the deduced CssA 
proteins from E8775 and from E107 04. 

cssB begins 17 bases downstream from cssA. There is a typical 
signal sequence. cssB was identified as the gene for the CS6 
structural protein designated as the secondary protein based on the 
amino acid N-terminal sequence. Five of 146 residues differ 
between the deduced CssA proteins from E8775 and from E10704. 

A region of dyad symmetry is present 6 bases downstream from 
CssB in both clones. The sequence is GGCCGCA TTAT TGCGGCC (Sequence 
#2) in E8775 ETEC and GGCCGCA TTATTGA TTGCGGCC (Sequence #3) in 
E10703. Underlined bases form the G-C rich stem. The calculated 
free energy value of these structures is -14.8 kcal. Such 
structures are often found in fimbrial operons after the genes 
encoding structural proteins. 

esse begins 48 bases downstream from cssB. it has a^ typical 
signal sequence. The deduced proteins from both clones have 212 
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residues with 7 differences. a search of protein databases 
indicated CssC is homologous to chaperone proteins^ necessary for 
expression of a number of fimbriae. The structure of PapD, the 
chaperone protein for Pap fimbriae, has been solved by X-ray 
crystallography and regions important for conserving the structural 
domains have been identified. CssC conforms to this consensus. 

The cssD gene begins 14 bases upstream of the end of cssC. 
The protein from E8775 is truncated relative to the protein from 
E10703 and there are 28 differences between CssD from E8775 and 
E10703. The deduced protein from cssD is homologous to molecular 
ushers. Overall, CssD and the other proteins are only around 30% 
identical and around 50% similar, but the nine proteins have areas 
of high homology dispersed throughout, especially the first 410 
residues, and 4 cysteines (residues 91, 112, and two near the C- 
terminus) which are conserved in all ushers. 

A region of dyad symmetry is present 347 base pairs into the 
CSSD gene in both clones. The calculated free energy value of 
these structures is -7.2 kcal. 

The plasmid from strain M285 was transformed into E. coli 
HBlOl purchased from ATCC. The resulting strain was named M295. 
Expression of CS6 from M295 was achieved from small-scale fermenta- 
tion. For production for human use, it was desirable to add a gene 
for resistance to kanamycin as the selectable marker. To that end, 
a vector was constructed based on pUC19 but with a gene for 
kanamycin resistance in place of the gene for ampicillin resis- 
tance. The CS6 genes from the pUC19 clone were subcloned into the 
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new vector and transformed into E. coli HBIOI. 

Vector PM323 was constructed as follows. The kanamycin 
resistance gene was purchased from Pharmacia, Uppsala, Sweden 
(Kan" GenBlock®) and inserted into a cloning vector by Dr. David 
Lanar at WRAIR. DNA including the gene was amplified by PGR using 
the plasmid from Dr. Lanar as template and primers flanking the 
multiple cloning site. A product of the desired size (1,580 bp) 
was obtained, but with much template present. To increase the 
purity of the 1,580 fragment, a second PGR reaction was performed, 
this time with a small amount of the first PGR reaction as 
template. This product was confirmed by agarose gel electrophore- 
sis, then digested with restriction enzyme Hindi to remove 
unwanted restriction enzyme recognition sites. This product was 
ligated to pUC19 digested with Sspl. The ligation mix was 
transformed into E. coli DH5a and plated on L agar plates supple- 
mented with kanamycin and Xgal. Isolate M318 had the desired 
phenotype of resistance to kanamycin and ampicillin with lac Z' 
intact. The gene for ampicillin resistance was removed to make a 
smaller vector. This was achieved by designing and synthesizing 2 
oligonucleotides to amplify just the portion of pM318 with the gene 
for kanamycin resistance, the lacZ' gene carrying the multiple 
cloning site, and the origin of replication. PGR was performed, 
the product ligated then transformed into E. coli DH5a with 
selection on L agar plates supplemented with kanamycin and Xgal. 
Isolate M32 3 had the desired phenotype of resistance to kanamycin, 
sensitivity to ampicillin, and intact lac Z' . Restriction digest 
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patterns confirmed the plasmid was a derivative of pUC19 with the 
gene for kanamycin resistance in place of the gene for ampicillin 
resistance. 

The CS6 genes were cloned into vector pM323 from pM285. pM323 
and PM285 were digested with restriction enzyme SstI, ligated, and 
transformed into E. coli DH5a with selection on L agar plates 
supplemented with kanamycin and Xgal. Isolate M334 was determined 
to express CS6. Plasmid analysis revealed M334 carried the CS6 
genes and 2 copies of the vector. An attempt was made to remove 
one copy of vector and at the same time move the clone into HBlOl, 
the desired host strain for fermentation. Isolate M340 was 
determined to express CS6 and retained 2 copies of the vector. An 
isolated colony of M34 0 was shown to produce high amounts of CS6 
and was saved as M346. 

In another embodiment lacking the kanamycin resistance gene, 
clones from an ETEC strain of serotype 025:H42 were derived from E. 
coli E8775 which was originally isolated from samples from 
Bangladesh. E. coli M56, which contains a 61-megadalton plasmid 
from E8775 Tox" and expresses CS6 has been described. The host for 
cloning was E. coli DH5a which was purchased from Bethesda Research 
Laboratories, Inc., Gaithersburg, MD. The host for plasmids used 
for production of heat, saline extracts was HBlOl (EMBO J. 
4:3887-3893 (1985)). 

Clones from E8775 were routinely grown in L broth. Antibiot- 
ics were added to L broth supplemented with agar as follows. 

« 

Ampicillin was added, when appropriate, at 50 /xg/ml. Chlorampheni- 
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col was used at 30 ^g/ml. X-Gal (5-broino-4-chloro-3-indolyl 6-D- 
galactopyranoside, Sigma) was added at 0.004%. CFA plates were 
prepared as previously described ( Infect . Iinmin . 57:164-173 (1989)) . 

Cloning CSS from E877 5 . The 61-megadalton plasmid from E. 
coli M56 was partially digested with lUndlll and ligated to pUCl9 
that had been digested with Hindlll. The ligation mixture was 
transformed into E. coli DH5a and plated onto L agar plates 
supplemented with ampicillin and X-gal. White (lac") colonies were 
picked to CFA plates supplemented with ampicillin and tested for 
CS6 expression using antisera as described below. Plasmids were 
purified as described (Infect^^MQun!.: 57 : 164-173 (1989)). Restric- 
tion enzymes were used according to the manufacturer's directions. 

Detection of CS6 Expression . CS6 expression by bacterial 
colonies was detected after transfer to nitrocellulose and 
treatment as described by Mierendorf ( Methods Enzy mnl , 152:458-469 
(1987)). Primary antisera was specific for CS6 and was raised in 
rabbits and absorbed as previously described ( Infect . immun . 
57:164-173 (1989)), except that rabbits were inoculated intrave- 
nously with live bacteria suspended in normal saline. Secondary 
antibody was peroxidase-conjugated goat anti-rabbit IgG (Cappel 
Laboratories, Cochranville PA) and detection was by TMB Substrate 
(Kirkegaard & Perry Laboratories, Inc., Gaithersburg MD) . 

Positive identification of CS6 was by western blots of heat, 
saline extracts. Heat, saline extracts were prepared from bacteria 
grown on the indicated media as described ( Infect . TmTnnn . 27:657-666 
(1980)). Proteins were separated on precast 16% Tricine sodium 
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dodecyl sulfate-polyacrylamide gels (SDS-PAGE, Novex Novel 
Experimental Technology, San Diego^ CA) and transferred to nitrocel- 
lulose. Blots were handled as described above for colony blots. 

Determination of N-terminal secruencp . Heat, saline extracts 
were obtained from E8775 or clones of E8775 grown on L agar or CPA. 
Partial purification of CS6 was obtained by ammonium sulfate 
precipitation, with extracts sequentially precipitated at 20%, 40%, 
then 60% saturation. Samples at 40% and 60% saturation were 
dialyzed against deionized water and loaded onto precast 16% 
Tricine SDS-PAGE (Novex, San Diego, CA) . Proteins were blotted 
onto polyvinyl idene di fluoride (PVDF) membranes (Westrans, 
Schleicher & Schuell, Keene, NH) , stained by Coomassie blue (Rapid 
Coomassie Stain, Diversified Biotech, Newton, MA) and bands of 
approximately 16 kDa were excised for automated gas-phase N- 
terminal sequencing analysis (Applied Biosystems Model 470A, Foster 
City, CA) . Data were analyzed using the Model 610A Data Analysis 
Program, Version 1.2,1 (Applied Biosystems, Inc, Foster City, CA) . 
These methods have been described in detail f Infect. Immun. 
60:2174-2181 (1992)). 

DNA sequencing. DNA sequencing of the clones derived from 
E8775 was performed using the Model 373A DNA sequencing system from 
Applied Biosystems, Inc, Foster City, CA. Reactions were performed 
using the dideoxy method with fluorescent dye-labeled terminators, 
double-stranded templates, oligonucleotide primers, and AmpliTaq 
DNA polymerase following the manufacturer's protocol. Appropriate 
oligonucleotide primers were synthesized using a Model 391 DNA 
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Synthesizer (Applied Biosystems, Inc, Foster city, CA) . Plasmids 
were purified for use as templates by a slight modification of the 
alkaline lysis method and cesium chloride density gradient 
centrifugation described by Maniatis ( Molecular Cloning: a 
Laboratory Manual , Cold Spring Harbor Laboratory, Cold Spring 
Harbor (1982)). Plasmids were recovered by dialysis followed by 
multiple ethanol precipitations to remove residual salt. Sequence 
analysis was performed using software developed by the University 
of Wisconsin Genetics Computer Group ( Nucleic Acids Res. 12:387-395 
(1984) ) . 
RESULTS 

CSS genes cloned from ETE C strain E8775 into PUC3 9 . a stable 
clone named M233 was obtained from a partial digest of the 61- 
megadalton plasmid from £. coli M56. It was a spontaneous deletion 
of a larger clone. The insert in M233 was approximately 24 kb. 
Subclones were obtained by digestion with various enzymes and a 
subclone containing 4.9 kb from the Hindlll site to Kpn l was found 
that expressed CS6. This clone was designated M285. Expression of 
CS6 was verified by transferring plasmids into E. coli HBlOl and 
detecting CS6 in heat, saline extracts. The cloned CS6 is 
expressed under the same conditions as CS6 from the native 61- 
megadalton plasmid (Table 1) . CS6 was detected in western blots of 
heat, saline extracts of bacteria grown on CPA, L agar or MacConkey 
agar. CS6 was not expressed on bacteria grown at 17 °C. 



19 



Table i. Regulation of CS6 Expression 



Strain: M287 

Plasmid: pM285 
Chromosome: HBlOl 

Media 

CFA 37-0 + 
CFA 17 °C 

L agar + 
MacConkey + 



M56 

native 
HBlOl 



+ 
+ 



E8775 

native 

native 



HBlOl 

none 

HBlOl 



N-terminal sequence of csfi . The protein sequence of the N- 
terminus of CS6 was determined from strains E8775 and from M233, 
the large clone derived from E8775. The 16 kDa proteins recovered 
from heat, saline extracts, and ammonium sulfate precipitation of 
M233 yielded two amino acids at each position (except cycle 12) 
indicating that two proteins were present. From the strength of 
the two signals, a probable primary sequence and a probable 
secondary sequence call was made for each of fifteen cycles. 
Quantitative analysis of the peak areas indicated that the molar 
ratio of the primary sequence (CS6A) to secondary sequence (CS6B) 
was approximately 3:1. The presence of the same two proteins was 
evident from strain E8775 grown on CFA agar and on L agar. 

DNA sequence of CS6 ooerons . The sequences of DNA cloned from 
E8775 (in M285) were determined. They are available from Genebank 
accession number U04846. The DNA sequence, when compared with 
sequences from another strain, were found to diverge abruptly on 



) 

both sides of the common area. Four open reading frames were 
detected. These were designated cssA, cssB, esse, and css D for cs 
six. 

The GC content of the DNA is 34% and the codon usage is in the 
range found for Escherichia coli genes that are expressed at low or 
very low levels as defined by Osawa ^ ^ ( Prokarvotic Genet ir. 
Code. Ex perentia 46:1097-1106 (1990)), 

Genes encoding CS6 structu ral genes . The four open reading 
frames are preceded by consensus sequences for binding RNA 
polymerase and ribosomes, DNA and deduced amino acid sequence of 
cssA, a CS6 structural protein. The DNA sequence of the entire 
operon is available from Genebank accession number U04844. The 
deduced amino acid sequence from E8775 is given. The arrow 
indicates the site of cleavage of the signal peptide. The protein 
sequence is associated with the sequence for the second construct: 
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- -10 RBS 

TTGACACATTACGAATGTTATGTATACAATAAAAATGATTATAGCAATATTAATGGTGTTAT 

ATGAAGAAAACAATTGGTTTAATTCTAATTCTTGCTTCATTCGGCAGCCATGCCAGAACA 
MKKTIGLILILASFGSHART 2 

GAAATAGCGACTAAAAACTTCCCAGTATCAACGACTATTTCAAAAAGTTTTTTTGCACCT 
EIATKNFPVSTTISKSFFAP 22 

GAACCACGAATACAGCCTTCTTTTGGTGAAAATGTTGGAAAGGAAGGAGCTTTATTATTT 
EPRIQPSFGENVGKEGALLF42 

AGTGTGAACTTAACTGTTCCTGAAAATGTATCCCAGGTAACGGTCTACCCTGTTTATGAT 
SVNLTVPEN. VSQVTVypvYD62 

GAAGATTATGGGTTAGGACGACTAGTAAATACCGCTGATGCTTCCCAATCAATAATCTAC 
EDYGLGRLVNTADASQSIIY82 

CAGATTGTTGATGAGAAAGGGAAAAAAATGTTAAAAGATCATGGTGCAGAGGTTACACCT 
QIVDEKGKKMLKDHGAEVTP102 

AATCAACAAATAACTTTTAAAGCGCTGAATTATACTAGCGGGGAAAAAAAAATATCTCCT 
NQQITFKA LNYTSGEKKIS P122 

GGAATATATAACGATCAGGTTATGGTTGGTTACTATGTAAACTAA (Sea #4i 
GIYNDQVMVGYYVN* (Seq. #5) 136 

The first open reading frame, cssA was identified as the gene for 
the CS6 structural protein CSSA designated as the primary protein 
based on the amino acid N-terminal sequence. cssA includes a 
signal sequence that is typical for many exported proteins. The 
deduced CSSA protein from E8775 has 136 residues, as shown above 
and in Table 2. The molecular weight agrees with that previously 
determined from SDS PAGE. No homologous proteins were found by 
searching the protein databases, but conserved residues are present 
near the C-terminus and this is typical of fimbrial subunits that 
are carried across the periplasm by chaperons. 
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Table 2. Characteristics of Proteins Deduced from CS6 Operons 

Number of Molecular Isoelectric 
Protein Source Residues Weight Point 

CSSA (CS6A) E8775 136 15,058 5.27 

CSSB (CS6B) E8775 146 15,877 4.40 

C^SC E8775 212 24,551 10.24 

CssD E8775 802 90,393 9.97 

CSSB begins 17 bases downstream from cssA. There is a typical 
signal sequence. cssB was identified as the gene for the CS6 
structural protein CS6B designated as the secondary protein based 
on the amino acid N-terminal sequence. The C-terminus matches the 
consensus typical of fimbrial subunits. The sequence from E8775 is 
given. The arrow indicates the site of cleavage of the signal 
peptide. 

MLKKIISAIA LIAGTSGWN A GNWQYKSLDV NVNIEQNFIP DIDSAVRIIP 30 

t 

VNYDSDPKLD SQLYTVEMTI PAGVSAVKIA PTDSLTSSGQ QIGKLVNVNN 80 
PDQNMNYYIR KDSGAGNFMA GQKGSFPVKE NTSYTFSAIY TGGEYPNSGY 130 
SSGTYAGNLT VSFYSN 14 6 (Seq. #6) 

A region of dyad symmetry is present 6 bases downstream from 
CssB in both clones. The sequence is GGCCGCA TTAT TGCGGCC (Seq. #2) 
in E8775 ETEC. Underlined bases form the GC rich stem. 

Genes with homologv to fi mbrial accessory proteins . cssC 
begins 48 bases downstream from cssB. It has a typical signal 
sequence. The deduced proteins from both clones have 212 residues 
with 7 differences. A search of protein databases indicated CssC 
is homologous to chaperone proteins necessary for expression of 
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Pap, CS3, K88, K99, CS31A, S, and Type 1 fimbriae of E. coli and 
SEF14 of Salmonella enteritidis, Fl and pH6 antigen of Yersinia 
Eestis, Type 3 of Klebsiella pneumonjap,, Type b of Haemophilus 
influenzae , and filamentous heamagglutinin of Bordetella pertussis . 
The structure of PapD, the chaperone protein for Pap fimbriae, has 
been solved by X-ray crystallography and regions important for 
conserving the structural domains have been identified. CssC 
conforms to the following consensus. Below is the deduced amino 
acid sequence of cssC. The * indicates conservative amino acid 
replacements. Dots are gaps necessary for aligning all sequences. 
Boxes indicate beta strands as defined for PapD. The designation 
of the beta strands for domain 1 (Al through Gl) and domain 2 (A2- 
G2) are given below each box. 



NNF 



* *R*** 

EINKTRVIYS 


DSTP 


* * 

SVQISNN 


KAYP. . 


*** ** * 

LIIQSNVWDES 


NNKNH. .D 


FIATPPIFKM 


Al 




Bl 




CI 




Dl 



* 

ESES 



* **** 
RNIIKIIK 



** E 
TTI . . NLPDSQE 



** ** * * 
SMRWLCIESM 



P* * * * 

PPIEKST. .KINRKEGRTDSINISI 



110 



El 



Fl 



*K**** 

RGCIKLIYR 


p* * 

PASVPSPVFNN. IVEK 


1 

* 

LKWHK 


1 

* 

NGKY 


1 

* * N 

LVLKN 


1 

*p** 
NTPYY 


1 

"*** it 
ISFSEVF 


Gl 




A2 




B2 




C2 



160 



FDSDKV. .NNAKD 



(Seq. #7) 



* * 

ILYVK 


P 
PY 


* * 
SEKKID 


* * 
ISN. .RIIKKI 


* 

KWAMI 


*D G* 
DDAGAKT 


KLYESIL 


D2 




E2 




F2 




G2 
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CssD begins 14 bases upstream of the end of cssC. When 
^compared with a second sequence there are 28 differences between 
CSSD from E8775 and the other sequence. The deduced protein from 
CSSD is homologous to molecular ushers found in operons of Pap, 
CS3, K88, K99 and Type 1 fimbriae of E, coli and SEF14 of Salmonel- 
la enteritidis , Fl of Yersinia pestis, and Type 3 of Klebsiella 
pneumoniae . Overall, CssD and the other proteins are only around 
30% identical and around 50% similar. Asterisks above the CssD 
sequences indicates amino acids conserved relative to molecular 
ushers . 

***** * ** . ^ 
MNQFYKKSHYSIQKHQITGLLFLLFIYPFSTSYGNEQFSFDSRFLPSGYN 50 

* ** G Y *** *N * * * c** * * 

YSLNSNLPPEGEYLVDIYINKIKKESAIIPFYIKGNKLVPCLSKEKISSL 100 
G* * * *c** ** ********** * * 

GININNNDNTECVETSKAGISNISFEFSSLRLFIAVPKNLLSEIDKISSK 150 

* G* ********* ******* w** * 

DIDNGIHALFFNYQVNTRLANNKNRYDYISVSPNINYFSWRLRNLFEFNQ 



****** * * *Q* * G* * * 



* 



NNDEKTWERNYTYLEKSFYDKKLNLWGESYTNSNVYNNYSFTGISVSTD 

M *****A* *** * v**G*F** 

TDMYTPSEIDYTPEIHGVADSDSQIIVRQGNTIIINESVPAGPFSFPITN 
* ****** ** **** * ** * 

LMYTGGQLNVEITDIYGNKKQYTVNNSSLPVMRKAGLMVYNFISGKLTKK 

* * *G ***** * ** * Q** *G G* 

NSEDGDFFTQGDINYGTHYNSTLFGGYQFSKNYFNLSTGIGTDLGFSGAW 

**** * *** ***** ***** 

LLHVSRSNFKNKNGYNINLQQNTQLRPFNAGVNFDYAYRKKRYVELSDIG 



200 



250 



300 



350 



400 



450 



** * **** * * * * 
WHGNLYNQLKNSFSLSLSKSLNKYGNFSLDYNKMKYWDNAYDSNSMSIRY 500 



^ ^ ^ ^ JP ^ .j^ ^ 

FFKFMRAMITTNCSLNKYQSYEKKDKRFSINISLPLTKDYGHISSNYSFS 



550 



NANTGTATSSVGLNGSFFNDARLNWNIQQNRTTRNNGVTDNTSVIATSYA 'fioo 

***** 



* G *** ** * * * 



VRID 650 



SPYGVFTGSYSGSNKYSSQFYSASGGIVLHSDGVAFTQKAGDTSA^- 
NISDIKIGNTPGVY^GYNGFA^IPHLQPFKKiItJlJnDKgJpdGItLnI 700 
KKQVIPSRGAIVKVKFDAKKGNDILFKLTTKDGKTPPL^AIAHEKNGKQI 750 

**** * * * * * c * 

NTGIVDDDGMLYMSGLSGTGIINVTWNGKVCSFPFSEKDISSKQLSWNK 800 

c 

802 (Seq. #8) 

But comparison with the protein from another strain, the sequence 
data shows the proteins have areas of high homology dispersed 
throughout, especially the first 410 residues. cssD has 4 
cysteines (residues 91, 112, and two near the C-terminus) which are 
conserved in all ushers. 

A region of dyad symmetry is present 347 base pairs into the 
CSSD gene in both clones. The calculated free energy value of 
these structures is -7.2 kcal. 

DNA flankinq the CS6 q ^nP s . wh^n compared with another 
strain, the DNA sequences of the two clones diverge immediately 
Idownstream of cssD and 96 bases upstream of cssA. The non- 
homologous flanking regions have homology with five distinct 
insertion sequences. The homologies include 3% to 32% of each 
insertion sequence but not entire insertion sequences. The 
homology of and Iso-ISl in E8775 continues beyond the clones we 
have sequenced and may be a complete insertion sequenced in the 
native plasmids. 



It should be noted that minor variation in bases of the 
peptides does not destroy antigenicity, a protein having at leasjb 
60% homology with the CS6 A and B proteins identified herein having 
conservative substitution would be expected to have desirable 
properties . 

As indicated previously, bacteria transformed with plasmids 
which express the CS6-A and CS6-B proteins may be administered by 
mouth. If the transformed bacteria are attenuated strains, they 
may be delivered live. It ig also possible to administer killed 
bacteria. Carbonated beverages such as carbonated water are 
particularly useful as carriers which are inexpensive. When the 
bacteria are administered in a carrier wherein the pH is not over 
7, an antacid may be given with the bacteria. 

The CS6 A and CS6 B proteins may also be at least partially 
purified and administered by mouth by means usually used in the art 
to deliver antigens to the intestinal tract, including in protected 
forms such as liposomes, microcrystals, microdroplets, as microen- 
cpsulated formulations or as enterically coated capsules. 
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